Logistic regression model is one of the simplest classification models. The most basic form deals with classifing a given set of data points into two possible classes, usually labelled as 0 and 1. The logistic regression model thus predicts an output y in {0,1}, given an input vector x. The probability is modeled using the logistic function $$ g(z)=1/(1+e^{-z})$$Namely, the probability of finding the output y=1 is given by $$ q_{{y=1}}\ =\ {\hat {y}}\ \equiv \ g({\mathbf {w}}\cdot {\mathbf {x}} + b)\,,$$ while the probability of finding y=0 is given by $$ q_{{y=0}} = 1 - q_{{y=1}}$$
Weights w are usually learned in the training step by using some optimization algorithem like gradient descent.
The typical loss function that one uses in logistic regression is computed by taking the average of all cross-entropies in the sample. For example, suppose we have N samples the loss function is then given by: $$L(w)\frac{1}{N}\sum_{n=1}^{N}H(p_{n},q_{n})=-{\frac 1N}\sum_{{n=1}}^{N}\ {\bigg [}y_{n}\log {\hat y}_{n}+(1-y_{n})\log(1-{\hat y}_{n}){\bigg ]}$$
In this example we will use MNIST database of handwritten digits provided in the tensorflow package. The corresponding labels in MNIST are numbers between 0 and 9, describing which digit a given image is. In order to deal with this problem we are going to use label representation of "one-hot vectors". A one-hot vector representation is a vector which is 0 in most dimensions, and 1 in a single dimension. In this case, the nth digit will be represented as a vector which is 1 in the nth dimensions. For example, 3 would be [0,0,0,1,0,0,0,0,0,0].
In the case of multiclass the output is given by: $$ \hat{y} = softmax(g(w⋅x + b))$$ which can be simplified by: $$ \hat{y} = softmax(w⋅x + b)$$ and the loss is defined as: $$ L(w) = \frac{1}{N}\sum_{n=1}^{N}H(p_{n},q_{n})=-\frac{1}{N}\sum_{n=1}^{N}y_{n}log(\hat{y}_{n})$$
In [1]:
# Import MINST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("tmp/data/", one_hot=True)
In [7]:
#import tensorflow
import tensorflow as tf
import numpy as np
# tf Graph Input
X = tf.placeholder("float", [None, 784]) # mnist data image of shape 28*28=784
y = tf.placeholder("float", [None, 10]) # 0-9 digits recognition => 10 classes
# Create model
# Set model weights
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
# Construct model
y_pred = tf.nn.softmax(tf.add(tf.matmul(X, W),b)) # Softmax
In [8]:
# Define Training Parameters
learning_rate = 0.01
training_epochs = 25
batch_size = 100
display_step = 1
In [9]:
# Minimize error using cross entropy
# Cross entropy
cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(y_pred), reduction_indices=1))
# Gradient Descent
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
In [13]:
# Initializing the variables
init = tf.initialize_all_variables()
# Launch the graph
with tf.Session() as sess:
sess.run(init)
# Training cycle
for epoch in range(training_epochs):
avg_cost = 0.
total_batch = int(mnist.train.num_examples/batch_size)
# Loop over all batches
for i in range(total_batch):
batch_xs, batch_ys = mnist.train.next_batch(batch_size)
# Fit training using batch data
sess.run([optimizer, cost ] feed_dict={X: batch_xs, y: batch_ys})
# Compute average loss
avg_cost += sess.run(cost, feed_dict={X: batch_xs, y: batch_ys})/total_batch
# Display logs per epoch step
if epoch % display_step == 0:
print "Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f}".format(avg_cost)
print "Optimization Finished!"
# Test model
correct_prediction = tf.equal(tf.argmax(y_pred, 1), tf.argmax(y, 1))
# Calculate accuracy
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print "Accuracy:", accuracy.eval({X: mnist.test.images, y: mnist.test.labels})
In [ ]: